6,202 research outputs found

    Smart contracts categorization with topic modeling techniques

    Get PDF
    One of the main advantages of the Ethereum blockchain is the possibility of developing smart contracts in a Turing complete environment. These general-purpose programs provide a higher level of security than traditional contracts and reduce other transaction costs associated with the bargaining practice. Developers use smart contracts to build their tokens and set up gambling games, crowdsales, ICO, and many others. Since the number of smart contracts inside the Ethereum blockchain is several million, it is unthinkable to check every program manually to understand its functionality. At the same time, it would be of primary importance to group sets of Smart Contracts according to their purposes and functionalities. One way to group Ethereum’s smart contracts is to use topic modeling techniques, taking advantage of the fact that many programs representing a specific topic are similar in the program structure. Starting from a dataset of 130k smart contracts, we built a Latent Dirichlet Allocation (LDA) model to spot the number of topics within our sample. Computing the coherence values for a different number of topics, we found out that the optimal number was 15. As we expected, most programs are tokens, games, crowdfunding platforms, and ICO

    Automatically evaluating the quality of textual descriptions in cultural heritage records

    Get PDF
    Metadata are fundamental for the indexing, browsing and retrieval of cultural heritage resources in repositories, digital libraries and catalogues. In order to be effectively exploited, metadata information has to meet some quality standards, typically defined in the collection usage guidelines. As manually checking the quality of metadata in a repository may not be affordable, especially in large collections, in this paper we specifically address the problem of automatically assessing the quality of metadata, focusing in particular on textual descriptions of cultural heritage items. We describe a novel approach based on machine learning that tackles this problem by framing it as a binary text classification task aimed at evaluating the accuracy of textual descriptions. We report our assessment of different classifiers using a new dataset that we developed, containing more than 100K descriptions. The dataset was extracted from different collections and domains from the Italian digital library \u201cCultura Italia\u201d and was annotated with accuracy information in terms of compliance with the cataloguing guidelines. The results empirically confirm that our proposed approach can effectively support curators (F1 3c 0.85) in assessing the quality of the textual descriptions of the records in their collections and provide some insights into how training data, specifically their size and domain, can affect classification performance

    La Biblioteca y el Museo de la Facultad de Odontología, Universidad Nacional de Córdoba como espacios culturales: una propuesta integradora.

    Get PDF
    The work presented is part of recognizing the importance of cultural projects of a comprehensive nature through education and art in the of the institutional identity of the subjects that form part of the this community, in this case a House of Higher Studies as it is the Faculty of Dentistry of the National University of Cordoba. It is interesting to explain how, through an integrative work methodology, short-term and long-term objectives through the formation of a group of work that is nourished by different perspectives from different Professions. The merger of the Library and the Museum of the Faculty of Dentistry of the UNC and the activities carried out since its start-up, yearn functioning as a basis of experience for the other institutions of the National University of Cordoba who want to adopt this integrative vision, as well as for the other centers of documentation of the country that interests them experiment with this proposa

    The impact of phrases on Italian lexical simplification

    Get PDF
    Automated lexical simplification has been performed so far focusing only on the replacement of single tokens with single tokens, and this choice has affected both the development of systems and the creation of benchmarks. In this paper, we argue that lexical simplification in real settings should deal both with single and multi-token terms, and present a benchmark created for the task. Besides, we describe how a freely available system can be tuned to cover also the simplification of phrases, and perform an evaluation comparing different experimental settings

    Smart Contracts Software Metrics: a First Study

    Get PDF
    © 2018 The Author(s).Smart contracts (SC) are software codes which reside and run over a blockchain. The code can be written in different languages with the common purpose of implementing various kinds of transactions onto the hosting blockchain, They are ruled by the blockchain infrastructure and work in order to satisfy conditions typical of traditional contracts. The software code must satisfy constrains strongly context dependent which are quite different from traditional software code. In particular, since the bytecode is uploaded in the hosting blockchain, size, computational resources, interaction between different parts of software are all limited and even if the specific software languages implement more or less the same constructs of traditional languages there is not the same freedom as in normal software development. SC software is expected to reflect these constrains on SC software metrics which should display metric values characteristic of the domain and different from more traditional software metrics. We tested this hypothesis on the code of more than twelve thousands SC written in Solidity and uploaded on the Ethereum blockchain. We downloaded the SC from a public repository and computed the statistics of a set of software metrics related to SC and compared them to the metrics extracted from more traditional software projects. Our results show that generally Smart Contracts metrics have ranges more restricted than the corresponding metrics in traditional software systems. Some of the stylized facts, like power law in the tail of the distribution of some metrics, are only approximate but the lines of code follow a log normal distribution which reminds of the same behavior already found in traditional software systems.Submitted Versio

    Automatically evaluating the quality of textual descriptions in cultural heritage records

    Get PDF
    AbstractMetadata are fundamental for the indexing, browsing and retrieval of cultural heritage resources in repositories, digital libraries and catalogues. In order to be effectively exploited, metadata information has to meet some quality standards, typically defined in the collection usage guidelines. As manually checking the quality of metadata in a repository may not be affordable, especially in large collections, in this paper we specifically address the problem of automatically assessing the quality of metadata, focusing in particular on textual descriptions of cultural heritage items. We describe a novel approach based on machine learning that tackles this problem by framing it as a binary text classification task aimed at evaluating the accuracy of textual descriptions. We report our assessment of different classifiers using a new dataset that we developed, containing more than 100K descriptions. The dataset was extracted from different collections and domains from the Italian digital library "Cultura Italia" and was annotated with accuracy information in terms of compliance with the cataloguing guidelines. The results empirically confirm that our proposed approach can effectively support curators (F1 ∼\sim ∼ 0.85) in assessing the quality of the textual descriptions of the records in their collections and provide some insights into how training data, specifically their size and domain, can affect classification performance

    How do you propose your code changes? Empirical analysis of affect metrics of pull requests on GitHub

    Get PDF
    Software engineering methodologies rely on version control systems such as git to store source code artifacts and manage changes to the codebase. Pull requests include chunks of source code, history of changes, log messages around a proposed change of the mainstream codebase, and much discussion on whether to integrate such changes or not. A better understanding of what contributes to a pull request fate and latency will allow us to build predictive models of what is going to happen and when. Several factors can influence the acceptance of pull requests, many of which are related to the individual aspects of software developers. In this study, we aim to understand how the affect (e.g., sentiment, discrete emotions, and valence-arousal-dominance dimensions) expressed in the discussion of pull request issues influence the acceptance of pull requests. We conducted a mining study of large git software repositories and analyzed more than 150,000 issues with more than 1,000,000 comments in them. We built a model to understand whether the affect and the politeness have an impact on the chance of issues and pull requests to be merged - i.e., the code which fixes the issue is integrated in the codebase. We built two logistic classifiers, one without affect metrics and one with them. By comparing the two classifiers, we show that the affect metrics improve the prediction performance. Our results show that valence (expressed in comments received and posted by a reporter) and joy expressed in the comments written by a reporter are linked to a higher likelihood of issues to be merged. On the contrary, sadness, anger, and arousal expressed in the comments written by a reporter, and anger, arousal, and dominance expressed in the comments received by a reporter, are linked to a lower likelihood of a pull request to be merged

    Empowering NGOs in Countering Online Hate Messages

    Full text link
    Studies on online hate speech have mostly focused on the automated detection of harmful messages. Little attention has been devoted so far to the development of effective strategies to fight hate speech, in particular through the creation of counter-messages. While existing manual scrutiny and intervention strategies are time-consuming and not scalable, advances in natural language processing have the potential to provide a systematic approach to hatred management. In this paper, we introduce a novel ICT platform that NGO operators can use to monitor and analyze social media data, along with a counter-narrative suggestion tool. Our platform aims at increasing the efficiency and effectiveness of operators' activities against islamophobia. We test the platform with more than one hundred NGO operators in three countries through qualitative and quantitative evaluation. Results show that NGOs favor the platform solution with the suggestion tool, and that the time required to produce counter-narratives significantly decreases.Comment: Preprint of the paper published in Online Social Networks and Media Journal (OSNEM
    • …
    corecore